This assignment is aim to solve the problems of for Mini Challenge 2
The global settings of R code chunks in this post is set as follows.
The following code input is to prepare for R Packages Installation.
packages = c('DT', 'ggiraph', 'plotly', 'tidyverse','dplyr','readr','hrbrthemes')
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
The following code is to import raw data sets from Mini Challenge2(“car-assignment.csv”,“cc_data.csv”,“gps.csv”,“loyalty_data.csv”).
credit_debit <- read_csv("data/cc_data.csv")
loyalty_data <- read_csv("data/loyalty_data.csv")
car_assignment <- read_csv("data/car_assignments.csv")
GPS <- read_csv("data/gps.csv")
glimpse(credit_debit)
Rows: 1,490
Columns: 4
$ timestamp <chr> "1/6/2014 7:28", "1/6/2014 7:34", "1/6/2014 7:35"~
$ location <chr> "Brew've Been Served", "Hallowed Grounds", "Brew'~
$ price <dbl> 11.34, 52.22, 8.33, 16.72, 4.24, 4.17, 28.73, 9.6~
$ last4ccnum <dbl> 4795, 7108, 6816, 9617, 7384, 5368, 7253, 4948, 9~
glimpse(loyalty_data)
Rows: 1,392
Columns: 4
$ timestamp <chr> "1/8/2014", "1/8/2014", "1/14/2014", "1/9/2014", ~
$ location <chr> "Carlyle Chemical Inc.", "Carlyle Chemical Inc.",~
$ price <dbl> 4983.52, 4901.88, 4898.39, 4792.50, 4788.22, 4742~
$ loyaltynum <chr> "L8477", "L5756", "L2769", "L3317", "L8477", "L57~
head(loyalty_data)
# A tibble: 6 x 4
timestamp location price loyaltynum
<chr> <chr> <dbl> <chr>
1 1/8/2014 Carlyle Chemical Inc. 4984. L8477
2 1/8/2014 Carlyle Chemical Inc. 4902. L5756
3 1/14/2014 Abila Airport 4898. L2769
4 1/9/2014 Abila Airport 4792. L3317
5 1/15/2014 Maximum Iron and Steel 4788. L8477
6 1/16/2014 Nationwide Refinery 4743. L5756
Using just the credit and loyalty card data, identify the most popular locations, and when they are popular. What anomalies do you see? What corrections would you recommend to correct these anomalies?
After glimpsing data structure of credit and loyalty card data, the heat map is a good way to visualize the most population locations and its population time.To create this graph,the data aggregation of loyalty card is needed.
loyalty_data$count_event=1
aggregate_dataset <- loyalty_data %>%
group_by(timestamp,location) %>%
summarize(Frequency = sum(count_event))
head(aggregate_dataset)
# A tibble: 6 x 3
# Groups: timestamp [1]
timestamp location Frequency
<chr> <chr> <dbl>
1 1/10/2014 Abila Zacharo 7
2 1/10/2014 Albert's Fine Clothing 1
3 1/10/2014 Bean There Done That 5
4 1/10/2014 Brew've Been Served 14
5 1/10/2014 Brewed Awakenings 3
6 1/10/2014 Carlyle Chemical Inc. 2
aggregate_dataset$timestamp <- as.Date(aggregate_dataset$timestamp, "%m/%d/%Y")
aggregate_dataset$Day <- format(aggregate_dataset$timestamp, format="%d")
head(aggregate_dataset)
# A tibble: 6 x 4
# Groups: timestamp [1]
timestamp location Frequency Day
<date> <chr> <dbl> <chr>
1 2014-01-10 Abila Zacharo 7 10
2 2014-01-10 Albert's Fine Clothing 1 10
3 2014-01-10 Bean There Done That 5 10
4 2014-01-10 Brew've Been Served 14 10
5 2014-01-10 Brewed Awakenings 3 10
6 2014-01-10 Carlyle Chemical Inc. 2 10
aggregate_dataset <- aggregate_dataset %>%
mutate(text = paste0("Location: ", location, "\n", "Day of January: ", Day, "\n", "Frequency: ",Frequency))
p <- ggplot(data = aggregate_dataset, aes(x=Day, y=location,fill=Frequency,text=text)) +
geom_tile() +
scale_fill_gradient(low="light blue", high="dark blue") +
theme_ipsum()
p <- p + theme(axis.text.y = element_text(size = 8))
ggplotly(p, tooltip="text")
DT::datatable(aggregate_dataset)